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Abstract. We expand earlier results by Boue and Dupuis BD98 where stochastic control problems with 
a particular cost structure, involving a relative entropy term, are shown to admit a solution by means of a 
change of measure technique. We provide methods of computing the corresponding optimal control pro- 
cess explicitly. Our results enables us to find solutions for optimal control problems to which the dynamic 
programming principle can not be applied. The argument is as follows. Minimization of the expectation of 
a random variable with respect to the underlying probability measure, penalized by relative entropy, may 
be solved exactly. In the case where the randomness is generated by a standard Brownian motion, this exact 
solution can be written as a Girsanov density. An explicit expression for the control process may be ob- 
tained in terms of the Malliavin derivative of the density process. The theory is applied to the problem of 
minimizing the maximum of a Brownian motion (penalized by the relative entropy) . The link to a linear 
version of the Hamilton-Iacobi-Bellman equation is made for the case of diffusion processes. 



1. Introduction 

In this paper we expand earlier results | BD98 1 that show how stochastic control problems with a par- 
ticular cost structure, involving a relative entropy term, admit a purely probabilistic solution, without 
the necessity of applying the dynamic programming principle. We provide two methods to compute an 
optimal control in this situation. The first method expresses the optimal control as a Malliavin deriva- 
tive. This enables us to solve control problems in which the dynamic programming principle fails. The 
second method transforms the problem of finding an optimal control into a linear PDE. 

Essential in our approach are the study of control problems by a change of measure technique | BD98 
Dav79, Hau86j and the useful properties of relative entropy DE97 Section 1.4]. 

1.1. Background. A well-known approach in solving optimal control problems is by means of the dy- 
namic programming principle, leading in the stochastic, continuous time case to the HIB equation, a 
nonlinear PDE | FR75 FS06 1 . Stochastic optimal control problems with a specific cost structure may be 
reduced to a linear PDE |Fle82|, FS06 Chapter VI]. Such a linear PDE is obtained by a applying a log- 
arithmic transform to the Hamilton-Jacobi-Bellman (HIB) equation; we will refer to this phenomenon 
as a linearization of the HIB equation. This observation gained new life in recent years as it was picked 
up by the physics and artificial intelligence community to obtain Monte Carlo methods for solving sto- 
chastic control problems | Kap05| . 

In this paper we show that this linearizing effect may also be obtained from a purely probabilistic per- 
spective. In fact the linearization is only a special case of a much wider class of probabilistic optimiza- 
tion problems that are regularized by a relative entropy term BD98 1 . In the setting of Markov chains the 
linearizing effect of relative entropy weighted optimization was, to our knowledge, first made in | Tod06 1 . 
By limiting arguments the connection with diffusions can be made. In Section[5]we show how this result 
can be obtained without reference to dynamic programming and without the need for discretization. 
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First we provide an alternative way of computing an optimal control by using Malliavin derivatives, as 
we show in Section[4] 



1.2. Outline. The main argument is as follows. Let (0,^,Q) denote a probability space and suppose 
we are given a random variable C on (Q, Q) indicating cost. We may change this probability measure 
to a new probability measure P but this operation is 'penalized' by a positive factor jj times the relative 
entropy J€ (P; Q) of the new probability measure with respect to the old probability measure. It is a 
result of direct computation that the 'optimal' probability measure, i.e. the one that minimizes E P C + 
^^f(P;Q), has a density proportional to exp(-/3C) with respect to Q, as described in Section^ This 
result may also be found in DE97). 

If we specialize to the case where all the randomness is generated by a Brownian motion, then an appli- 
cation of Girsanov's theorem shows that a change in probability measure depending on some 'control 
process' corresponds to a relative entropy equal to quadratic control costs. Furthermore any probabil- 
ity density may be obtained using such a Girsanov type change of measure, which holds in particular 
for the optimal probability measure with density proportional to exp(-/3C) with respect to Q. This re- 
sult was obtained earlier by Boue and Dupuis |BD98|. This material is explained in detail in Section|3] 
The focus on this paper is on minimizing cost functionals depending on a Brownian motion. As an 



exception, in Section 3.4 an example is given of application of the theory to processes with jumps. 



An explicit expression of the optimal control process in terms of a Malliavin derivative involving the cost 
random variable C may be given (Section]?}. This argument provides us with a new approach to solve a 
class of control probems with quadratic control costs. Since the form of the cost random variable is not 
restricted this method may be applied in cases where dynamic programming (i.e. the HJB equation) 
fails. For example, it is shown that the maximum of a Brownian motion with drift may be minimized by 



this method in Section 4.2 resulting in an explicit optimal control policy. This example clearly illustrates 
the novelty of our approach. 

The relation of our approach to classical stochastic optimal control (as in FR75 KS91|) is explained 
in Section[5] There it is shown that the solution of the state dependent optimal control problem may 
be expressed as the solution of a linear PDE. This may be contrasted to the nonlinear HIB PDE that is 
fundamental in classical stochastic control. As explained above this result was obtained earlier [Fle82 
Kap05 1 but derived in an entirely different way, namely by a logarithmic transformation of the nonlinear 
HIB equation. 

To make the paper self contained we provided some background information on relative entropy (Ap- 
pendix|Al). 



1.3. Notation. We denote the Euclidean norm in R" by | • |. For a matrix A e U nxm we write \\A\\ := 

sup x£H m \Ax\ for the usual matrix norm. 

1*1=1 

The set of all Borel measurable functions mapping a measurable space D into a Borel space E is denoted 
by B [D; E) , and the set of all bounded Borel measurable functions from D into E is denoted by CD; E) . 
Similarly C(R m ;R") denotes the space of continuous functions from U m into W and C 1,2 (7 x P m ;PP) 
denotes the space of continuous functions f{t,x) from / x P m into U n that are once differentiable with 
respect to (E/cl and twice differentiable with respect to x e U m , with all these derivatives being 
continuous. 

If (Q, & , P) is a probability space we write E p for expectation with respect to the probability measure P. 



2. Relative entropy weighted optimization 



Let (Q, & , Q) be a probability space. Furthermore suppose a real-valued random variable C, bounded 
from below, is given, indicating cost. 

We wish to find a probability measure P that 
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(i) is absolutely continuous with respect to Q (denoted by P«Q), 

(ii) minimizes the expected cost E P C, but 

(iii) has minimum deviation from the original probability measure Q. We take the relative entropy 



;Q)= ( 
Jn 



In 



as a measure of this deviation (see Appendix|A.l|. 



In 



Note that (i) is a constraint and (ii) and (iii) are conflicting optimization targets. Weighing both the 
expected cost and the relative entropy, we arrive at the following problem: 

Problem 2.1 (Relative entropy weighted optimization). Let /3 > 0. Find a probability measure P « Q 
such that P minimizes the functional 



(1) 



/(P) = E P C+^^(P;« 



■ E 



C+ -In 



Let AC(Q) denote the set of all probability measures P on such that P « Q. Then AC(Q) is a 

convex set. The following result, which may also be found in | DE97 1 , says pretty much everything there 
is to say about this general situation. 

Theorem 2.2. Let P* be the measure given by 



(2) 
Then 



= z * . = exp(-^C) 

d® ' E^exp(-pC)' 



(i) for any P e AC(Q), we have 



(3) 



/(P) = Ij5f(p ; p*)_ ilnE Q exp(-/3C). 



In particular, 



(ii) / is a strictly convex function over AC (Q) , 

(iii) P* solves Problem \2j\ and 
(iv) 



(4) 



/(P*) = -iln(E Q exp(- J 6C)). 



Proof. We prove (i); the other results follow immediately from the properties of relative entropy (Propo- 
sition|A!5l. Write K= E Q exn(-BC). To see (3l, note that for dP/dQ - Z, 



Aj) . Write K = E Q exp{-fJC). To see (3), note that for 

^(P;P*)= [ Z(lnZ-lnZ*) <iQ> = J€ (P; Q) - f ZlnZ* dQ 
Jn Jn 

= J*f(P;Q>)+ / ZflniT + ^C) d<Q = J€ (P;Q>) + \nK + 
Jn 



ZC. 



Hence 



/(P) = E V ZC+-<W;( 



i^(P;P*)-iln*T. 



□ 



3. Dynamic relative entropy weighted optimization 

In this section, we consider the important special situation where all randomness is generated by a 
multi- dimensional Brownian motion. Changes of measure (satisfying mild conditions) may in this case 



be expressed as a Girsanov type transformation (see Lemma 3.4 below). The stochastic process ap 



pearing in the exponent of the Girsanov density will constitute the 'control process'. Another crucial 
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observation is that the relative entropy of such a transformation is given by the squared control costs 
(Proposition|3~8|(i)). 



Also we will derive different (but obviously related) explicit expressions for the optimal control process, 
namely as a time derivative of an expectation (Proposition 3.8 iii)), as a Malliavin derivative (Theo- 
remP 



4.2 and in the following section as the derivative of a solution to a PDE (Theorem|5.6| 



3.1. Setting. Let Q) be a probability space. Let W = (Wj)o<t<r define a Q-standard Brownian 

motion in U p . Let G^t)o<f<r denote the filtration generated by W, and let :- a (u f >o^f). 

Let ^ denote the set of M p -valued progressively measurable stochastic processes U such that the pro- 
cess (Z t y ) defined by 



Z f u := exp 



IP nt rt 

£ / u l s dwl-\ \ \u s \ 2 ds 

\i = \Jo J0 

is a martingale. In particular if U satisfies the Novikov condition, 



f>0, 



exp 



\U s \ l ds 



< oo, 



then U e (see IKS911 Proposition 3.5.12]). The set °U will be called the set of controls and U e °U will 
be a control process. 

By Girsanov's theorem KS91 Theorem 3.5.1], there exists a probability measure P u defined by the 
Radon-Nikodym derivative 



(5) 



dP u 
~d® 



(P POO POO 
y / Uldwl-l \ \U S \ Z ds 
i=1 Jo Jo j 



with respect to which the process 

(6) t~W":=W t -f U s ds, t>0, 

Jo 

is a standard Brownian motion. Let E u be a shorthand notation for E p!/ . 



Suppose a random variable C indicating cost is provided, which is bounded from below and square 
integrable with respect to Q. Define the cost function by 



(7) 



KU):=E u C+^M'(P> u ;<i 



: E 



C + 



i r° 
•pJo 



\U s \ z ds 



where the equality is a result of Proposition 3.8 (i). We consider the following problem. 



Problem 3.1 (Dynamic relative entropy weighted optimization). Find the optimal value J* defined by 

(8) /* := inf RU), 

and, provided it exists, a unique minimizer U* e argminye^ /(ID- 



Note the similarity to Problem 2.1 The main difference between the two problems is that in Prob- 
lem 3.1 we restrict the possible probability measures to those parametrized by U e °U, through their 
density given by J5J . 

3.2. Main result. We are now ready to state the main result of this section. First we collect the ingredi- 
ents. 

Hypothesis 3.2. (i) Let (Q, 3 ", Q) be a probability space, on which a p-dimensional standard Brow- 
nian motion (Wf)f>o is defined; 

(ii) Let i&t) f>o be the filtration generated by W; 

(iii) Let C be an -measurable random variable which is bounded from below and such that 
E^ICI <oo; 

(iv) LetjS>0. 
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Theorem 3.3. Suppose the conditions of Hypothesis 3.2 are satisfied. Then there exists a square integrable, 



G^V)f>0 adapted stochastic process U* e °U that solves Problem \3.1\ The corresponding probability mea- 
sure F* := P u is given by (2J and the optimal value is given by^.In particularly* is optimal among all 



probability measures that are equivalent with Q, in the sense that it solves Problem 2.1 IfV is another 
stochastic process solving Problem \3~l\ then U* and V are indistinguishable. 



Before we prove Theorem 3.3 we provide a key lemma that is most helpful in establishing the existence 
of an optimal U. 

Lemma 3.4. Suppose (Q,i^,Q) is a probability space on which a d -dimensional standard Brownian 
motion (B f )f>o is defined. Let {&t)t>a be the complete filtration (i.e. including all null-sets) generated 
by{B t ) t >Q. 

Suppose a nonnegative random variable Z is given such that 

(i) E^llnZI <oo; 

(ii) Z is &oo- measurable. 

(iii) E®Z=l. 

Let P be a probability measure with density Z with respect to Q, so ^ = Z, and define the conditional 
density process (Z f ) by Z t := i9[Z\& t ], t > 0. Then Z t is Q-a.s. continuous. Furthermore there exists a 
unique U d -valued, progressively measurable stochastic process 9 such that 

(a) the following expression holds: 

(d ft nt d \ 

£ J 9\dB l s -\j ^|0 s | 2 dsl, for all t > 0, Q- almost surely. 

(b) the process B t -B t - Jq 6 S ds, t>0, is a P -Brownian motion. 

(c) 9 is square integrable, and^9 [| f °°\9 s \ 2 ds] = -E^lnZ. 

Proof. Define a uniformly integrable martingale [Z t ) by Z t := [Z\& t ] . By the martingale representa- 
tion theorem for Brownian martingales IKal02l Theorem 18.10], (Z t ) is Q-a.s. continuous (and therefore 
progressively measurable) and there exists a unique U d -valued, progressively measurable process <p sat- 
isfying J °° \(p s \ 2 ds < oo, Q-a.s. such that 



d r t 
i=iJo 



(10) Z t =l+y\ (p l s dB\, t>0, Q-almost surely. 



The condition E^| lnZ| < oo implies that Z > 0, Q-almost surely. By Lemma 3.5 Z t > for all t > 



almost surely. Ignoring the Q-null set where Z t = 0, define 9 t := |r for t > 0. Note that 9 is progressively 

measurable, since cf> and Z are. With this choice of 9, we have dZ t = T.f =l 9' t Z t dB\, for t > 0, with 
solution (9), Q-almost surely. Part (b) is then a direct consequence of Girsanov's theorem (see |Kal02 
Theorem 18.19]). Since E^| lnZ| < oo we may compute -E^lnZ as 



-E Q lnZ = E Q 



J, co d poo poo 

|0 s | 2 ds-£ 9idBl = ±E Q \9 s \ 2 ds. 
i=\J0 JO 



□ 



In the proof of Lemma 3.4 we used the following lemma. 



Lemma 3.5. Let (O,^, G^f)f>o,Q) be a filtered probability space and let Z be a Q-density on O, i.e. Z: 
IR, Z> 0, Q-a.s. andE Q Z = 1. 

J/Z>0, Q-a.s., theninf t > Z t >0, Q-a.s., where Z t := E Q [Z|J^]. 
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Proof. Define P by ^ - Z. By |Kal02 Lemma 18.17] inf < t <„ Z t > 0, P-a.s. for all n > 0. Therefore 



li 



A = { inf Z f = 

lf>0 



:U„ eN -i inf Z t = 

0<t<n 



is a P-null set. We have E™ [Z1 a ] - PL4) - 0. If QL4) > 0, then Z{oj) = for Q-almost all cjeA, which is a 
contradiction. Hence Q(A) = 0. □ 



The condition E^l lnZ| < oo exc lude s the situation where Q>(Z = 0) > 0. Note in particular if Z > c, Q-a.s. 

, 'Ifljl 2 < -Inc. To illustrate the condition E^llnZI < 



3.4 



we have | 



for some c> 0, then by Lemma 
oo we provide an example where this condition is not satisfied, and find that P(Z t = 0) > and therefore 
the definition of 9 becomes problematic. This should not be surprising, since the expression (9) can not 
become zero for well-behaved (i.e. square integrable) 8. 



Example 3.6 (Where the density process becomes zero) . Suppose, for some a e R and a standard Brow- 
nian motion B, 

ifB T >a 
otherwise. 

where A; is a normalizing constant and T > 0. Note that Q(Z — 0) > and hence E^|lnZ| = oo. We 
compute 



k r°° 

Z t = f9[Z\& t ] = kQ[B T -t*a-x)\ x=Bt = -—\ exp{-f/2)d(, 0<t<T 

\/2n J -7= 



and 



Z f = E Q [Z|,sr f ] = 







if B T > a 
otherwise. 



for t > T. 



The following lemma will be used to establish uniqueness of the solution. 
Lemma 3.7. Suppose U and V are U p - valued stochastic processes in . Then 







tU s -V s ) z ds. 



In particular, if P = P then U and V are indistinguishable. 



Proof. We compute, for simplicity in the case p — 1, 



u. m v 



) = E u In 



E u In 



d<Q Y\ 



dQ dP v ). 

r poo poo poo poo 

= E u I U s dW s - \ I U 2 S ds- I V s dW s + \ \ V 2 ds 
[Jo Jo Jo Jo 

r roo roo poo poo poo 

= E U \J U s dW^+ I J U* ds- J V s dW^ - J U S V S ds+ \ J Vj 

poo 

= E C/ (Us-V s fds. 
Jo 



□ 



Proof of Theorem 3.3 ■ As already noted, Problem |3.1| is a version of Problem 2.1 but restricted to the set 
of probability measures with density P u for U e a U. So if we can find a squa re integrable U e °l l for 
which V u has the density function given by (2), i.e. one that solves Problem 2.1 because of Theorem 2.2 
then it only remains to show uniqueness of such a U. 

Therefore we simply define our candidate density function Z* by (2). Note that Z* is Q-square inte- 
grable since C is bounded from below. Note that E^exp(-^C) < oo since C is bounded from below. 
Since E Q |C| < oo it follows that Q(C < oo) = 1, so that E Q exp(-^SC) > 0. Hence 

E |lnZ*| = E Q |-/3C-lnE Q exp(-/3C)| < PE Q \C\ + |lnE Q exp(-/3C)| < oo. 
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By Lemma 3.4 there exists a square integrable, adapted, U p -vahied stochastic process U* s uch t hat 
Z* := E Q [Z* |JF ( ] has the form (3). Since by definition (Z*) is a martingale, U* e °U. By Lemma|3£] U* 
is unique up to indistinguishability. The expression for the optimal value follows directly from Theo- 
remEl □ 



Here we give some useful equalities, which hold for any U E °U (so in particular for IP). 
Proposition 3.8. The following relations hold for any U e a U. 

(i) J?(.P u ;Q) = E u [y™\U s \ 2 ds]; 

(ii) [Z u W r \&t] = Z^Wt + f t E9[U s ZY\&t\ dsforO <t<r; 



(iii) U t = - =„ ^ 



1=1 a.s. for t > 0; 



Proof, (i): This is a direct consequence of Lemma 3.7 by taking!/ = 



(ii) : In the remainder of the proof we fix U e °il and omit the superscripts U in Z u etc. Let t > and 
r> t. Define a stochastic process (Y s ) s >t by 

y = ( W s r< 5<r, 
s \ W r s>r 

and note that Y satisfies the equation dY s = ^ s < r dW s for s> t. The process (Z s ) is the stochastic expo- 
nential of U s , so dZ s = Zf =1 U l s Z s dW s l . Then using Ito's formula, 

d{Y s Z s ) = {\< r Z s + U S Z S Y S ) dW s + \< r U s Z s ds, 

so that 

/>oo rT 

E <f [ZW,W t ] = E®lZ 00 Y 00 \& t ] = Z t Y t + j WE Q [t/ s Z s |^ f ]ds=Z t W f + J E Q [[/ s Z s |J? t ] ds. 

(iii) : Taking the derivative of relation (ii) with respect to r and evaluating at r = t, 



dr 



E®[ZW r \& t \ 



E®[U r Z r \&, 



m =u t z t . 

r = t 



□ 



3.3. Example: C = \{Wt - x*) 2 . We illustrate the theory developed so far on a simple example. Let 
T > 0. Let (W t )o<i<r denote a standard Brownian motion and consider the process X t :- W t . (After a 
Girsanov change of measure, X will have become a Brownian motion with drift.) For the cost variable 
C :- \{Xj - x*) 2 , we have Z* oc exp(-/3C) = exp(-|/3(Wj--jc*) 2 ). Letting A' denote a normalization 
constant, we compute 



Z* = E^ 1 [Z* \W t = x\ = hfi[ exp (- A/J(W r - **f) \ X t ] = [rap (-£/?( W T - 1 + x - x*) 2 )] 



1 



: exp(-|pW f -x*) 2 /pU)) 



where p(r) := 1 + /^(r - f), < £ < 7\ In this computation we used the Markov property of W and 
LemmapO)! (i) . 



Using Ito's formula, 
or, equivalently, 



dz; = -i-^— - — -z; dw t = ^ — -z; dw t , 



pit) 



pit) 



Z* = exp f J £ L7;dW s + ij'iU*) 2 ds), 0< t<T, 



where L7* = 



ptfl ' 



< t < T, so that 



dV 



u* 



= Z*. 
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By Theorem 3.3 the process U* minimizes 



RU) = E U 



k(X T -x*) 2 +^- [ U 2 (s,X s )ds , 
2p Jo 

over U e a U, where X satisfies the SDE dX t = dW t - U t dt+ dW^ with Wj 1 a Brownian motion under 
P u . This is a first example where the optimal solution is computed without application of the Hamilton- 
Jacobi-Bellman equation. 

3.4. Processes with jumps. The procedure to obtain a solution to a relative entropy weighted optimiza- 
tion problem may in principle be repeated for other stochastic processes, such as jump processes. We 
will illustrate this for an example. 

Let TV be a Poisson random measure on U with intensity measure v satisfying f R v{dz) < oo. Let the cost 
variable be given by C := Xt, where X t := J * f w y(z) N[ds, dz), for < t < T. Let /3 > 0. By Theorem 2.2 
the density Z* oc exp(-jSC) is optimal in the sense that it solves Problem 2.1 We wish to see what effect 
this change of density has on the dynamics. 

We compute 

oo ri n 

E Q [exp(-/3C)|^ t ] = exp{-fiX t ) £ P{N{[t, 21, R) = ")]!/ exp[-/3j{z)) v(dz)/v(M) 

n=0 j=iJM 

oo e - v(W)(T - f) (v(i?)(r- t)) n j r , , \" 

= exp{-/3X t ) ^ 1 / exp(-j6 r (z))v(dz)/v(R) 

„=0 n - WR / 



°° (Lexp(-pj{z))v{dz){T-t)) 

= expl-px,-vm(.T-t)) £ Uu Fl P/ — '- '- 

t^o " ! 

= exp(-)3 / / y(z) N(ds,dz)+ {T- t) \ {exp(-/3j{z)) - l}v{dz) . 
I Jo Jr Jr ; 

For the optimal density process this gives 

Z* :=E Q '[exp{-pC)\& t ]/E Q [exp{-pC)] = exp[-l} [ [ j{z) N{ds,dz) - t [ {exp(-/3y(z)) - 1} v{dz]\ . 

\ Jo Jr Jr ; 

An application of Girsanov's theorem for jump processes I0SO7I Theorem 1.35] gives that the random 
measure 

N*{dt,dz) = -exp{-pj{z))v[dz) dt + N(dt,dz) 
is the compensated random measure corresponding to N{dt,dz) under the optimal probability mea- 



sure P* (as prescribed by Theorem 3.3 . In particular, the intensity measure of N{dt, dz) under P* is 
v* {dz) := exp{-pj{z))v{dz). The relative entropy of P* with respect to Q maybe computed as 

,iS?(P*;Q) = E*lnZ r = E* \-/3 [ f j{z) N{ds,dz) - T f {exp(-/3y(z)) - 1} v{dz) 
V Jo Jr Jr 

= E* -jST / j(z)exp{-Pj(z))v(dz)-T j {exp(-^y(z)) - 1} v(rfz) 
[ Jr ' Jr 

= r( l-exp(-/3y(z)Xl + /3y(z))v(dz). 



A numerical experiment is shown in Figure[T] Note that the expression exp(-/3 . . .) appears again in the 
expression for the optimal intensity measure. In this simple example explicit computations are possible. 
We aim to extend the theory developed in this paper to general stochastic processes in the near future. 

4. The optimal control as a Malliavin derivative 

Let (0,^,Q) be a probability space on which a p-dimensional standard Brownian motion {W t )o<t<,T is 
defined, with T < oo. Let {&t)t>o be the filtration generated by W. In this section we write D t F for the 
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- uncontrolled (P = 0) 

- controlled (p = 1 
controlled (P = 2) 
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Figure 1. Illustration of the example of Section [3T4] Sample paths are shown 
of a controlled Levy process (y(z) = z) for different values of (5. Here v{dz) = 
f[z) dz, with /(z) = az~ 112 exp(-r5z). This results in v* {dz) = f* (z) dz with /* (z) = 



az~ exp(-()S + 8)z). Parameter values are a — 10, 8 — 5. The sample paths for differ- 
ent values of p are constructed using the same underlying pseudorandomly generated 
numbers so that the sample paths may be compared. Note that for higher f>, there are 
fewer jumps and the jump sizes are slightly smaller. 



Malliavin derivative of an &j -measurable random variable F at time t and ,<? for the domain of D in 
Z/7(n), q> 1. See |DN0PO9|lNuaO6l for details. 

The following lemma is a consequence of |Nua06 Proposition 1.2.8]. 

Lemma 4.1. Suppose Fe ID 1,2 . Then E Q [F\& t ] e O 1 ' 2 forO < t< T andD t E Q [F\ & t \ = ¥9[D t F\& t ]. 

Theorem 4.2. Suppose P is absolutely continuous with respect to Q with Radon-Nikodym derivative 
, where Z is S^j -measurable for some T > 0. Let Z t := E®[Z\& t ], < t < T, denote the density 



process. Suppose Z e D 1,2 . Then Z t e D 1,2 for alio < t < T . Define a stochastic process V by 
(11) 



D t Z t E Q [D t Z\& t \ 
V t :=D t \nZ t = —h 1 = ^Ll.L , 0<t<T. 



Z t 



\zw t \ 



Then Z = exp ( /„ r V s d W s - \ fj Vf dsj . 

It is interesting to compare expression (TT) to Proposition |3.8| (iii). 
Proof. Note that 

D S Z S = D s E Q [Z t \& s ] = E Q [D s Z t \& s n l0iS] {s) = E®[D s Z t \& s ] for 0< s< te [0, T], 



where the second equality is a consequence of Lemma 4. 1 By the Clark-Ocone representation formula 
INua06l Proposition 1.3.14] 

Z t = 1 + f E® [D s Z t \& s ] dW s =l + [ D S Z S dW s = [ V s Z s dW s 



for V s — -jr 1 . Bythe chainrule of Malliavin calculus, |Nua06, Proposition 1.2.3], D s lnZ s — —w^. Using 
LemmaOagain, we have E Q [D t Z\& t ] = D t E Q [Z\& t ] = D t Z t which finishes the argument. □ 
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An application of the chain rule of Malliavin calculus also gives the following corollary. 
Corollary 4.3. Suppose C e D 1,2 and let /3 > 0. Then the stochastic process U* defined by 

1 [exp(-pC]D t C\& t ] 



u; 



[exp(-PQ\&t 



0<t<T, 



solves ProblemlKl 



3.3 



4.1. Example: C = \{Wj - x*) 2 , continued. We may apply Corollary 4.3 to the example of Section 
as an alternative way to compute the optimal control process U* . Using the chain rule of Malliavin 
calculus, 

D t C= \D t {W T -x*Y L = {W T -x*)D t W T = W T -x*, 0<t<T, 
and it is then straightforward to check (using Lemma |A"l3| (ii)) that 

-,0E Q [exp[-PC)D t C\& t ] /E Q [exp(-pC)\& t ] 
gives the same expression for U* as the one already obtained using Ito's formula. 

4.2. Example: C = maxy<r<r W t . We will now apply the results of Section|4]to an example where Ito's 
formula fails. This example therefore shows the strength of the new approach. 

Define M t := maxo< s <f W s and take C := Mj for some T > 0. The optimal density function is Z* oc 
exp(-/3C) and we wish to obtain the density process t9 [Z* \& t ] . For the distribution of M t we have (by 
iKSim Section 2.8.A1) 

,1/2 



) (Af t > a) ■■ 



9 \ i/z poo 

- 

71) Jal-ft 



exp(-^/2) d$, t>0,a>0. 



Conditional on & t , the event Mj = M t occurs when the maximum over [f, T] does not exceed y :- M t . 
This has the same probability as the event that the maximum over [0, T—t] does not exceed y — x, for 
x :- W t < y, so 

/ 2 \i/2 r M '-^t 

Q(MT = M t \Wt = x,M t = y)=QmT-t^y-x) = \-\ J q VT ~' exp(-<f 2 /2) d(. 
For < x < y < z we compute 

Q(M T > z\W t = x,Mt = y) = Q[M T = M t \W t = x,M t = y) + Q(M r _ f >z-x) 



n) Jo 



exp(-£ 2 /2) d% + 



2 ^ r°° 



exp(-r/2) d(. 



Therefore the density function of Mt conditional on ZP t is equal to 



/m t |J? ( (<D - 



71{T- t) 



1/2 



exp 



2{T-t) 



for < > Af t > W t . 



Write K:= E^[exp(-/3Mr)]. We will make use ofthe error function (erf) and complimentary error func- 
tion (erfc), defined by 

2 



erf (x) 



2 r 

\/5r Jo 



exp(-?7 ) drj, erfc (x) := 1 - erf (x) : 



2 r° 



exp(-?7 ) ^77, x > 0. 



We compute 

Z* t = ^[Z*W t ] = [exp(-pM T )m] = pE Q [exp(-^M r )1 Mr =M f l^t] + [exp(-^M r )1l Ml . >Mt |^t] 



^exp(-/5M t )Q(M r = M t \& t ) + [exp(-/3M r )lM r >M t l^t] 
it X 



1 
1 



1/2 P M t- W t 



exp(-)SAft)f-) f eatpC-^^d^ + f- 

\7lj Jo \7t 

Mt-W t 



{T-t) 



1/2 



roo 
JM, 



exp(-j30exp 



U r - w t ) 2 
2(r- r) 



exp (-)3 Aft) erf 



V2{T-t) 



+ exp[-pw t +lp 2 {T- r))erfc 



(M t -W, + p{T-t) 



V2(T-t) 
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- uncontrolled Brownian motion 

- uncontrolled maximum process 

- controlled Brownian motion 

■ controlled maximum process 




YV4 




Figure 2. Above: sample path of a Brownian motion and its maximum process, un- 
controlled and controlled (/3 = 1). Below: the control policy corresponding to the con- 
trolled Brownian motion. 



An expression for the Malliavin derivative of Mj is available: D t Mj = \o,t] (t), where t is the a.s. unique 
point where W attains its maximum. See Nua06 Exercise 1.2.11]. Hence by the chain rule for the 
Malliavin derivative, 

D t Z* = -4 exp(-pM T )D t M T = - exp(-pM T )\ 0iJ] (t). 
K K 

Note that t < t if and only if Mj > M t . We can compute 

<t>t '■- [D t Z*\& t ] = -^-0 [ exp(-jSM r )1 !Mr>M( } | &t] 



1/2 



K 
P 

K\n{T-t) 

P exp{-f3W t +lp 2 {T- f))erfc 



P°exp(-#- 



G-w t y 



di, 



K 



2{T- t) 
( M t -W t + p{T-t) 
{ {2{T-t)) m 



By Theorem 4.2 we conclude that for U* := <ptlZ^ we have 

dZ* = UfZ* dW t 

so that U* is the stochastic process which solves the optimization problem 



minimize /([/):= E 



max W t + — f w?l — E u \ max ( w" + [ U s ds} + — f U*} , 
)<f<r 2/5 Jo s \ [o<«ri 1 J j 2/5 J s \ 



with W u a Brownian motion under P . Note that <p t and Z f * , and hence U* , are explicitly given in terms 
of t, Wt and M t . The process U* may be written 'succintly' as U* - u{t, W t , M t ), where 

-j9«p(-0ii/ + y\T- f))erfc (^gg^) 



u{t, w, m) = 



exp(-£m)erf (j==) + exp(-j3w + \p 2 {T- r))erfc ( 



m-w+fHT-t) 
V2(T-t) 



0< t <T,weU,m> w. 



An illustration of this control policy and its effect on a sample path of and is provided in Fig- 
ured 
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Remark 4.4. This example illustrates how the theory applies to non-Markovian processes, and there- 
fore provides a method that applies where a naive application of dynamic programming (i.e. the HJB 
equation) would fail. 

In this particular case, by augmenting the state to {W t ,M t ) the optimal control becomes Markovian, 



but this requires a nonstandard application of the HJB equation; see also Remark 5.4 and |HS91 1 for a 
detailed analysis in a closely related class of examples. 

5. Relation to classical stochastic optimal control 

In this section we will link the theory of the previous sections to the classical theory of stochastic optimal 
control | FR75 1 . We will list some instances of optimal control problems and explain how the theory of 
the previous sections can be applied. 



Assume as before the conditions of Hypothesis 3.2 defining a filtered probability space (Q,^, (^ t ) 



on which a p-dimensional standard Brownian motion W is defined. In classical stochastic optimal 
control theory the notion of state is fundamental. The dynamics of the state will be described by a 
stochastic differential equation. For this we require the following additional assumptions. 



Hypothesis 5.1. Suppose b : [0,oo) x R n U n and a : [0,oo) x 1 



s »xp 



(i) locally Lipschitz, i.e. for every bounded set B c U n and T > there exists a constant K > such 
that 

\b{t,x)-b{t,y)\<K\x-y\, and \\a{t,x)-a(t,y)\\<K\x-y\, forallO< t < T andx, yeB. 

(ii) monotone in the following sense: for every T > there exists a positive constant K > such that 
forallxeK" and te [0, T], 

x T b{t,x) + \\\o{t,x)\\ 2 < K{\ + \x\ 2 ). 

Note that the monotonicity condition (ii) above is less restrictive than the linear growth condition which 
is more commonly found in the literature. 



Under these assumptions, we will consider for xeU n the stochastic differential equation 
(12) 



dX t = b{t,Xt) dt+a{t,X t )dW t , f>0, 
X = x. 

We may think of l fl2) as describing the uncontrolled dynamics. We make use of the following result on 
the existence of a unique strong solution to fi2) . See |Mao97 Theorem 2.3.6, Theorem 2.4.1]. 



Theorem 5.2. Under the conditions of Hypothesis 5. 1 for every x e R n there exists a unique solution ( up 



to indistinguishability), denoted by {Xf) t >o, to (12) satisfying swp Q ^ t ^ T t^\Xf\ 2 < Cj{1 + \x\ 2 ) for every 
T > and some constant cj depending on T. 

Such a process X x is called a Markov diffusion process. Note that in general the Markov process is time 
inhomogeneous since the dynamics depends explicitly on time through b and a. If b and a do not 
depend explicitly on t then X x is called a time homogeneous Markov diffusion process. 

We consider the set of Markov controls which consists of mappings u e f?([0,oo) x U n ; U p ) such that 
for all xeR" the stochastic process U x defined by U*{a)) := u{s,Xf(.w)) is in a U. For u £ we will 
write P x,u :- P ux , with U x as above, and similarly E x,u = E u . Note that P x,u depends on x through the 
definition of U x . 

For u e "Km, the process {Xf) t >o also satisfies the following SDE, of which we can think of as the con- 
trolled dynamics: 

fl „, ( dX t =[b[t,X t ) + a[t,X t )u(t,X t )) dt+a(t,X t )dW x ' u , r>0, 

1 J I X = x. 
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where by Girsanov's theorem IKS91I Corollary 3.5.2], as before, the process (W*' u ) defined by Wf' u := 
W t - f u{s, Xf) ds is a standard Brownian motion with respect to the probability measure V x,u . 

We will specialize to the situation where the cost random variable C also depends on the initial con- 
dition and is a functional of the paths of the stochastic process X x for jc e 05". We will write C x {d>) :- 
c{X x {(i))) where c : C([0,oo);IR") — «■ IR for wef]. The following examples of cost functionals are often 
used. Here £ denotes anypathin C([0,oo);IR"). 

(i) Finite time horizon problem: c(f) = f Q T (p{t,£ (r)) dt + ij/{£,{T)} for some T > 0, with <p e B{[0, T] x 
R";R),i^eB0R";R); 

(ii) Infinite time horizon problem with exit from a region: c(£) = f Q T <p{t,£{t)) dt + y/[T,( (T)), where 
t = inf{r > : < t G] with G c IR" open, <p,ye B([0,oo) x U n ;R n ); 

The method outlined in the previous sections will enable us to find solutions to these problems, under 
certain conditions. These solutions will be compared to the solutions obtained by classical methods. 



Define as before the total cost function by 

(14) J{x; u) :=E X ' U C X + -^(P*'";Q) = E x 



i r°° 

c(X x ) + — \u(s,Xf)\ 2 ds 
2pJo 



for x e R n which is now specialized to Markov controls u e Recall X x satisfies (13) with W x,u a 
standard Brownian motion under p x < u . In this setting, we consider the following problem. 

Problem 5.3 (Relative entropy weighted optimal control of Markov diffusion processes). Find the value 
function /* : [0,oo) x U n — U, defined by 

J*(x)= inf J{x;u] for t e [0,oo) and x e U n , 

and, in case it exists, the optimal control policy u* e °Uy[ which satisfies 

/*(*) = J(x;u*) foraUxeIR". 



It should now be clear that solving the relative entropy weighted Problem 5.3 is equivalent to solving 
a classical stochastic optimal control problem with quadratic control costs and with dynamics given 
byd). 

Remark 5.4. We cannot always expect to find Markov controls that are optimal in the sense of the more 



general Problem 3.1 For example, in Section 4.2 where a solution was computed for minimization of 
Mj = maxo<f<;r W t , the optimal stochastic process depends not only on time t and the current state 
W t but also on the running maximum M t . By augmenting the state to (W t ,M t ) the optimal control 
becomes Markovian. However, the time evolution of (W t , M t ) cannot be put in the shape of (12) . In fact, 
the process X defined by X t = M t - W t , < t < T, has the same probability law as a Brownian motion 
reflected at the origin and satisfies a Skorohod equation; see |KS91 Section 3.6.C]. 

Remark 5.5. Often in the theory of stochastic optimal control, the cost function J{x) is defined to de- 
pend explicitly on some starting time to. This explicit dependence on starting time is useful in obtaining 
the optimal control through the dynamic programming principle (i.e. the Hamilton-Jacobi-Bellman 
equation). Since we do not use the dynamic programming principle we do not need to consider the 
value function for all initial times to. In our setup we always have fo = to avoid confusion, without loss 
of generality. 

5.1. Linearized Hamilton-Jacobi-Bellman equation. In this section sufficient conditions are obtained 
in order for the optimal control U* to be a Markov policy, so that L7 f * - u{t, X t ). 

Let G c IR" be open and let t x : - inf{f > : Xf t G] denote the exit time from G. Note that t x is a stopping 
time. We will study two cases: the infinite horizon problem and the time homogeneous exit problem. 
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5.1.1. Finite horizon case. Let T< oo and define the stopping time t£ := t x a T. Let C x — f T <p[t,X x ) dt+ 
y/{Tj,X* x ) with (p,y/ e Bj,([0,oo)xlR") so that the integral exists and is finite. Let L denote the Kolmogorov 

backward operator corresponding to (12), given by 



(15) 



df 



df 



Tl'J 



Lf{t,x)^^{t,x)+Y J b i {t,x)^ J {t,x) + \Y.Y. W(t,x)<j{t,xf] 



1=1 



dx 



i=l;=l 



dx l xi 



(f,x) 



for feC 1 - 2 ([0, T] 



Theorem 5.6 (Linearized Hamilton-Jacobi-Bellman equation - Finite horizon). Let Hypothesis 5.1 be 



satisfied. Suppose that ye C ' ([0, T] 



satisfies the PDE 



(16) 



Ly(t, x) - P<j){t , x)y ( t , x) = 0, t e [0, T],xe G, 
y{t,x) — exp{-/3y/{t,x)), xedG ort-T. 



Suppose, Q-almost surely y(t,X x ) > for < t < t x t and x e G. Define u* {t,x) :- a ^jrf^f^ for all 



[t, x) e ([0, T],G) for which y{t,x) ^ 0. Then any meas urable extension ofu* to [0, T] x G solves Prob- 
Furthermore the value function of Problem 5.3 is given by J* (x) = - 4 In y(0,x), for x e G. 



lem 



5.3 



Proof. Fix x e U n and omit the 'jc'-superscripts in Xf, C x etc and the '* '-superscript in u* . Define 
Y t := y{t,X t ). Then by Ito's formula, 



dY,= Ly{t,X t ) dt+Y^T^a 1 - '(t,X t )-^(t,X t ) dW{ 

i=lj=l " x 



■ p(p(t,X t )y(t,X t ) dt+y{t,X t ) £ uHt,X t ) dW ] t 

J'=i 

P 

■ p(P(t,X t )Y t dt+Yt^u 3 \t,X t ) dW ] t . 



It may be checked that a solution for this SDE is given by 

p 



y(t,X t ) = y f = y exp 
By the boundary condition, 
Y tT = exp{-/3y/{T T ,X TT )) = F exp 



r t P rt rt 

p / 0(s,x,) ds+ £ / «j(a s ) rfw/ - i I 

Jo j=l > ' >' 



|«(s,X s )[ 2 ds 



P\ <f>(s,X s ) ds+Y \ u J (s,X s ) dWl -\\ \u(s,X s )\ 2 di 
Jo ;'=i > ' ^° 



X S T ds 



JO 



exp \-p \ cf>(s,X s ) ds- Py/{T T ,X TT )\ 



Multiplying by exp [-p Jq T cp{s, X s ) ds) yields 
*o ex P | E f Q T ui ( 5 > d Wi - \ JT 7 | 

By taking expectations, y(0,x) = F = E" J exp(-j3 Jq T <p{s,X s ) dt - Pyr(T T ,X JT )) = E Q exp(-/3C). By the 
proof of Theorem |3.3| we may conclude that U t := u[t,X t ) solves Problem [3T] and therefore that u solves 
ProblemHU □ 

Remark 5.7. Assume for simplicity that a := ao~ T and b are globally Lipschitz continuous and bounded, 
(p is bounded uniformly Holder continuous and y/ is bounded continuous on ( [0, T] x SG) u ( T x G) . Note 
that under the following further assumptions (T6) has a unique solution that may be represented by the 
Feynman-Kac formula, 



(17) 



y(t,x) = E t 



exp 



(p{s,X s ) ds- py{j x T v f,X T * vf ) 



(where E f,x denotes expectation with respect to the law of X satisfying (T2) with initial condition X t — x) : 
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(i) nondegenerate case: G is open and bounded with smooth boundary, or G = 
uniformly elliptic, i.e. z T a[t,x)z > [i\z\ 2 for some p > and all z e W, t e [0, T] and x e G (see 
IMao97l Theorems 2.8.2 and 2.8.3]); or 

(ii) degenerate case: G-U", b,cr,<p and y are homogeneous in t, in which case the Markov semi- 
group corresponding to X is a Feller- Dynkin semigroup corresponding to X (see |Kal02 Theo- 
rems 21.11 and 24.11]). 

In case (17) holds, it follows immediately (by boundedness of cp and y/) that y > on its domain. 



Remark 5.8. The expression (16) may be seen as a linearized version of the Hamilton-Jacobi-Bellman 
(HJB) equation. This linearized HJB equation may alternatively be obtained from the nonlinear HJB 
equation FR75 Theorem 4.1], by applying a logarithmic transform (see Kap05|). This observation 
formed the starting point for the research of this paper. 

Remark 5.9. Note that if we do not restrict ourselves to Markov controls, the v alue function is given 



by (4), i.e. J*(x) = -^lnE^ exp(-/3C*), which may be compared to (17). Theorem 5.6 is therefore partly 
a restatement of the Feynman-Kac solution to the corresponding Dirichlet and Cauchy problems IKS91I 
Section 5.7] (and the proof is largely similar). But note that the theorem also states the existence of a 
Markov control and gives an expression for the optimal Markov control u* in this case. 



5.1.2. Exit problem. We will now consider the time homogeneous case where b and a do not depend on 
t, with G a bounded open subset of R" with smooth boundary dG. Let L denote the time homogeneous 
Kolmogorov backward operator corresponding to (12), given by 



(18) 



for /£C 2 



df 



£/(*) = £&'(x)^to + ±EL [trixialxf] 



d 2 f 
dx'dxi 



;=i i=i j=i 

Define the cost random variable by C x := J T (p(Xf) ds+y/ {X* x ). 



Analogously to Theorerr |5.6| the following result can be established. 

be satisfied. Suppose that y e C 2 (G; R) satisfies the PDE 
Ly{x)- P(p{x)y{x) = G, xeG, 



Theorem 5.10. Let Hypothesis 
(19) 



5.1 



y{x) = exp(-py/(x)), 



xedG. 

a(x) T Vy{x) 



Suppose furthermore that y{x) > for x e G. Define u* (x) := — — for all x e G. Then u* solves 

is given by J* [x) = —4 In y{x), for xe G. In 



Problem 



5.3 



Furthermore the value function of Problem 



5.3 



particular, y may be expressed as 

(20) y{x) = E Q [exp(-/3C x )] = E Q exp l~pJ Q ds - py{X?) 

Remark 5.11. For example in the case of bounded G, with uniform ellipticity of a := o~a T , with a and 
b globally Lipschitz, <p bounded uniformly Holder continuous and nonnegative, and y/ continuous on 
dG, by |Mao97 Theorem 2.8.1] a unique solution to (T9) exists and is hence given by (20) guaranteeing 
positivity of y on G. 



AppendixA. Appendix 



A.l. Relative entropy. In the following, p and v denote probability measures over some measurable 
space (S, Z). 

Definition A.l. Suppose, for all A e Z, that p{A) — => v(A) = 0. We then say that v is absolutely contin- 
uous with respect to p. This is denoted as v « p. If both p « v and v « p then we say that p and v are 
equivalent. 



We recall the classical notion of Radon-Nikodym derivative IWil91l Theorem 14.13]. 
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Theorem A.2 (Radon-Nikodym derivative). Ifv « p then there exists a random variable X e 5£ (S, ~L,p) 
such that 

v{A)= \ Xd\i. 

J A 

This variable X is called a version of the Radon-Nikodym derivative ofv relative to p on (S, 2), and dif- 
ferent versions agree p-almost surely. We write 

dv 

— := X onl,, u-a.s. 
d\i 

Definition A.3. We call a Z-measurable nonnegative random variable / a density (function) with respect 
to p if there exists a probability measure v that has Radon-Nikodym derivative / relative to p. 

Definition A.4. The relative entropy of p with respect to v is defined as 

/s m (sv) if p is absolutely continuous with respect to v, 

oo, otherwise. 

Relative entropy is also known as Kullback-Leibler divergence; for this paper we use the term 'relative 
entropy' since it seems to be better known in the mathematics community. In general, 3€ {p; v) is not 
symmetric in p and v. 

The following proposition summarizes some useful properties of relative entropy. In particular, it indi- 
cates that relative entropy is a good indication of how similar two probability measures are. 

Proposition A.5. (i) The relative entropy J€{p;v) is well-defined (i.e. the integrals exist) and as- 
sumes its values within [0,oo] . 

(ii) 3€{p; v) — if and only ifp -vonS, p-almost everywhere. 

(iii) J€ {p; v) is strictly convex in p. 

Proof. See IDE97I Section 1.41. □ 

A.2. Exponents of Gaussian random variables. For convenience we state, without proof, the following 
lemma. (Note that the second identity is obtained by differentiating the first identity with respect to a 
and then setting a = 0.) 

Lemma A.6. Suppose Y is a real-valued random variable that is normally distributed with mean peU 
and variance a 2 > 0. Let a e U andj e U such that p + jo 2 > 0. Then 

(i) E[exp( a y- l T Y 2 )] = ^exp(- ^-y V ), 

(ii) E[Fexp(-I r 7 2 )] = ^exp(-^). 

Acknowledgements. We are grateful to dr. ir. O. van Gaans (Mathematical Institute, Leiden University) 
for his review of our work and his helpful suggestions that enabled us to improve the exposition. 
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